OrfMapper: A Web-Based Application 
for Visualizing Gene Clusters on 
Metabolic Pathway Maps 
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Abstract 

Computational analyses of, e.g., genomic, proteomic, or metabolomic 
data, commonly result in one or more sets of candidate genes, proteins, 
or enzymes. These sets are often the outcome of clustering algorithms. 
Subsequently, it has to be tested if, e.g., the candidate gene-products are 
members of known metabolic processes. With OrfMapper we provide a 
powerful but easy-to-use, web-based database application, that supports 
such analyses. All services provided by OrfMapper are freely available at 
http : / /www . orf mapper .com 
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Introduction 

The amount of sequence related data 
increased dramatically during the past 
years. This is due to improvements 
of high-throughput and computational 
methods in *omics that often yield long 
lists of gene, protein, or enzyme iden- 
tifiers (IDs) . In our laboratory we pro- 
cess different kinds of sequence based 
data, e.g., DNA-microarray derived 
gene-expression data. The ultimate pur- 
pose of any gene-expression experiment 
is to produce biological knowledge. In- 
dependent of the methods used, the re- 
sult of microarray experiments is, in 
most cases, a set of genes found to be 
differentially expressed between two or 
more conditions under study. The chal- 
lenge faced by the researcher is to trans- 
late this list of differentially regulated 
genes into better understanding of the 
biological phenomena that generate 
such changes. A good first step in that 
direction is the translation of the se- 
quence ID list into a functional pro- 
file. Biological pathways can provide 
key information about the organization 
of biological systems. Major publicly 
available biological pathway diagram re- 
sources, including the Kyoto Encyclo- 
pedia of Genes and Genomes (KEGG) 
(J, GenMAPP and BioCartfj^] can 
be used to allocate sequence data in 
pathway maps. With this manuscript 
we do not intend to present a review 
about existing solutions but focus on 
our approach. 

Our project requires the analysis of 
sequence cluster lists and extend the 
analysis to a maximum possible num- 
ber of organisms. KEGG currently pro- 
vides adapted maps for over 380 spe- 
cies covering the following molecular 

1 http : //www .biocarta. com 



interaction and reaction networks: me- 
tabolism, genetic information process- 
ing, environmental information process- 
ing, cellular processes, human diseases. 

In order to use the KEGG path- 
way database to display and map genes 
to KEGG pathways, we developed a 
web-based tool called OrfMapper. Orf- 
Mapper is an easy-to-use but powerful 
application that supports data analy- 
sis by extracting annotations for given 
keywords and gene, protein, or enzyme 
IDs, allocating these IDs to metabolic 
pathways, and displaying them on path- 
way maps. Two color codes can be 
assigned to the IDs, which can, e.g., 
represent sequence properties, organ- 
ism identifiers, or cluster memberships. 
These color codes are used in the query 
output. The query results are displayed 
in hypertext format as a web page, pre- 
pared for download as tab-delimited raw 
text, and visualized on colored, hyper- 
linked KEGG metabolic pathway maps 
that can be downloaded in PDF for- 
mat. Together with a version optimized 
for personal digital assistants, OrfMap- 
per provides unique functionality with 
respect to accessing and displaying 
KEGG pathway data. 

Implementation 

Technical Background 

OrfMapper has been entirely developed 
with PHP version 4.3.^ an open source 
scripting language that is especially 
suited for Internet development. Cre- 
ation of PDF is performed with FPDF 
version 1.53^] a freely available PHP 
class that allows generating PDF files. 
OrfMapper runs on a Apple Mac OS X 

^http : //www .php .net 
"http : //www . f pdf . org 
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version 10.2 operating system with an 
Apache version 1.3.33 HTTP serveiQ 
The processed KEGG data are stored 
in a local relational MySQL database 
version 4.1.13 [^database. 

Database & Updates 

The database behind OrfMapper con- 
tains gene identifiers, the annotation, 
organism, and pathway information, re- 
spectively. The database is updated 
monthly. Therefore, information from 
the KEGG FTP-servei|3 and from the 
KEGG web sit^] are parsed. In or- 
der to keep OrfMapper working and to 
avoid user query errors during updates, 
duplicated tables are used. Upon suc- 
cessful download and processing, the 
updated tables are activated while out- 
dated tables are inactivated. 

Usage 
User Input 

OrfMapper was designed for prompt 
display of metabolic relations between 
gene products by the use of KEGG path- 
way maps. A detailed online help guides 
the beginner through the user inter- 
face. The user has to specify either 
annotation keywords (e.g., "hydroge- 
nase protein" or CoxA), gene IDs (e.g., 
KEGG, NCBI, UniProt), or enzyme IDs 
(i.e., EC-numbers). The user input can 
either be uploaded as an ASCII text 
file, be exported from spreadsheet ap- 
plications (e.g., Microsoft Excel or 
OpenOffice Calc), or directly pasted into 
a text area on the web page. 

jhttp : //www . apache . org 
5 http : / /www .mysql . com 
6 ftp : //ftp . genome . j p/pub/kegg/ 
'Ihttp : //www. genome . ad. jp/kegg/ 



Data Format 

OrfMapper is made as flexible as possi- 
ble in order to handle individual input 
data formats. The IDs can be listed ei- 
ther vertically or horizontally or mixed. 
They can be separated by all typical 
text delimiters, e.g., tabulators, spaces, 
commas and semicolons. Placing key- 
words in quotation marks forces 
OrfMapper to perform a boolean AND 
query. 

Organism Selection 

By default, all organisms are queried 
for all entered IDs and keywords. In 
order to restrict output to selected or- 
ganisms, it is possible to specify those 
organisms in the first input row. This 
line must be preceded by an angle 
bracket character " >>" followed by or- 
ganism names or just parts of organ- 
ism names (e.g., "droso" instead "Dro- 
sophila melanogaster" ) . The organism 
names must be separated by commas. 
If no match to an organism name is 
found, all organisms are queried. 

Coloration 

In order to customize visualization, the 
user may specify colors for individual 
IDs. Therefore, either a color name 
(e.g., yellow, blue, red) or a hexadec- 
imal RGB code (e.g., #FFFF00) can 
be appended to IDs and keywords with 
two underscore characters " _ _" (e.g. 
genename_ _blue, genename_ _ #000080, 
keywordl _ _red, "keywordl keyword2" 
_ _green). This colors the enzyme box 
corresponding to the ID on a KEGG 
pathway map. Likewise, the user can 
add one additional value to change the 
box border color. This is achieved by 
adding another color preceded by an 
underscore character to the ID (e.g., 
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genename__blue__red). Coloration is ber, metabolic pathway name, box back- 
extremely helpful to specify and, in the ground color, and box border color, 
output, to identify gene products with Upon clicking the document symbol in 
common properties, such as expression the hypertext query results, OrfMap- 
levels or cluster affiliation. per creates a PDF version of the cor- 
responding KEGG pathway map. The 
Spreadsheet Import graphical PDF map can be saved lo- 
cally, is scalable, optimized for print- 
Large sets of query data are often stored ingj and includes hyperlinks to KEGG 
in spreadsheet applications, e.g., Mi- metabolite and enzyme information. If 
crosoft Excel, OpenOffice Calc, or Mi- cobrs were assigned to sequ ence IDs in 
crosoft Access. Thus, we took special the query input; the background and 
care to simplify date import from these borders of en zyme boxes are colored in 
applications. If the data are organized thc pDF maps xhc pDFs are oriented 
in three columns (ID, box color, and such that the KEGG pathway maps 
box border color, respectively), then fit perfectly either to por trait or land- 
they can directly copy-pasted into Orf- gcape paper format 
Mapper. Upon clicking the Convert 
Tab button, all tabulators are converted 
to underscores, as required. Discussion 



Output 

OrfMapper creates three forms of out- 
put: hypertext, raw tab-delimited text, 
and graphical PDF pathway maps, re- 
spectively. The hypertext query result 
contains all gene annotations, pathway 
information, and hyperlinks to KEGG 
pathway maps corresponding to the user 
defined query (Fig. [T|. This output is 
sorted by organism names, metabolic 
categories, pathways, and gene prod- 
ucts. The latter two levels are hyper- 
linked to the corresponding KEGG in- 
formation pages. This query result can 
be downloaded as raw tab-delimited text 
file for further processing. The first 
line of the text file contains the IDs 
given by the user. All following lines 
contain the full set of query results with 
the following entries: sequence or en- 
zyme ID, KEGG species sequence ID, 
annotation with EC-number and KEGG 
orthology ID, KEGG organism ID, spe- 
cies name, KEGG pathway map num- 



OrfMapper was designed for displaying 
metabolic pathway oriented informa- 
tion of keywords and nucleotide, pro- 
tein, or enzyme IDs of sequenced or- 
ganism. Numerous visualization tools 
for analyzing biological data are avail- 
able. OrfMapper fills a gap by provid- 
ing quick access to pathway informa- 
tion via one input field with flexible in- 
put formats and output coloration op- 
tions. 

KEGG itself provides an integrated 
tool that can be used to color metabolic 
pathway objects . However, OrfMap- 
per has a much broader functionality 
by allowing cross-species queries, giv- 
ing a more detailed output, hypcrlink- 
ing individual genes, and converting thc 
colored pathway maps to PDF format 
retaining hyperlinks. 

A condensed version of OrfMapper 
requiring less screen space and show- 
ing reduced output is devoted to palm- 
sized PDAs. Its screen size is scaled to 
240 pixel width and the output of gene 
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Figure 1: OrfMapper GUI. The query 
is either uploaded from a local file (1) 
or typed/pasted into the input field (2). 
Results are visualized as HTML or can 
be downloaded as tab delimited file (3). 
Hits are organized by organisms (4), 
metabolisms (5), submetabolisms (6), 
and enzymes (7). Pathway maps with 
colored hits can be downloaded as PDF 
(8) and gene information retrieved (9, 
10). 
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annotations is omitted. If equipped 
with WLAN, this allows on the spot 
information retrieval and mapping of 
keywords and gene or enzyme IDs, e.g., 
during research seminars. 

OrfMappers' functionality will con- 
tinuously be expanded. While the sim- 
ple graphical user interface and query 
syntax will stay unchanged, extensions 
with respect to the application of func- 
tional characters are planned. We are 
currently integrating further sequence 
IDs, e.g., from the protein data bank 
(PDB). Furthermore, we are planning 
to facilitate nucleotide and protein se- 
quence querying. 
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